1. Introduction

On March 17, 2020 President Trump referred to the Coronavirus as the “China Virus.” Shortly after, the number of anti-Chinese incidents started to increase across the United States. One aspect of public health that is often thrown to the wayside is how influential public officials and leaders are in disseminating public health information. Moreover, not only can their words change the public’s views on a health matter but it can also shift a nation’s perspective on someone’s identity. [ADD SOME BACKGROUND LITERATURE THAT BACKS THIS CLAIM UP] In addition, given the influence of identity politics we may expect the term “China Virus” to be more polarizing to certain identities and states. [ADD BACKGROUND LITERATURE + IDENTIFY SUBJECTS OF INTEREST]. There are fewer studies devoted to analyzing these aspects of the pandemic.

Thus,our project aims to explore the relationship and variability of interest in the term “China Virus” across states through political, demographic and COVID-19 characteristics. We ask the question, [INSERT OUR RESERACH QUESTION] ? In order to answer this question, we look to gain a better understanding of [INSERT NAME OF BAYESIAN METHODOLOGY]. Ultimately, we hope our research inspires others to explore and better understand the impact langauge and words have on the public during times of crisis.


2. Data

2.1 Data Descriptions

2.2 Variables of interest

2.3 Visualizations NEED TO WEAVE A COHEASIVE MESSAGE BETWEEN THESE VISUALIZATIONS TO TELL A STORY

2.3a Demographic

<<<<<<< HEAD

=======
Finaldata$hover <- with(Finaldata, paste(polyname, "<br>","Positive Cases:", positive, "<br>","Negative Cases:", negative,"<br>",
                           "Total Cases:", totalTestResults,"<br>", "% White:", percent_white,"<br>",
                            "% Asian:", percent_asian, "<br>", "State Color:", StateColor,
                        "<br>", "Total Population", total_population))
# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)

fig <- plot_geo(Finaldata, locationmode = 'USA-states')
fig <- fig %>% add_trace(
    z = ~ChinaVirusInterest, text = ~hover, locations = ~State,
    color = ~ChinaVirusInterest, colors = 'Oranges'
  )
fig <- fig %>% colorbar(title = "Interest")
fig <- fig %>% layout(
    title = '2020 Google Search Interest of "China Virus" by State <br>(Hover for additional information)', geo=g
  )

fig
Finaldata$hover <- with(Finaldata, paste(polyname, "<br>","Positive Cases:", positive, "<br>","Negative Cases:", negative,"<br>",
                           "Total Cases:", totalTestResults,"<br>", "% White:", percent_white,"<br>",
                            "% Asian:", percent_asian, "<br>", "State Color:", StateColor,
                        "<br>", "Total Population", total_population))
# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)

fig <- plot_geo(Finaldata, locationmode = 'USA-states')
fig <- fig %>% add_trace(
    z = ~percent_white, text = ~hover, locations = ~State,
    color = ~percent_white, colors = 'Blues'
  )
fig <- fig %>% colorbar(title = "Interest")
fig <- fig %>% layout(
    title = '2020 Percent White by State <br>(Hover for additional information)', geo=g
  )

fig
>>>>>>> 9892f1d4d9acb3936aac388e226678e52e5f08cb

Our first visualization is looking at the percent of residents that identify as white within the United States compared to those whom identify as asian. As you can see, there is a higher percent of white identifying residents overall but most specifically in the Midwest and northeast states. In regards to Asian-Americans, there is practically less than .1% per state with the exception of California and New York. From this visualization we can also see that places like Texas, California, and New Mexico have much lower white identifying residents which could provide important information for us in our actual analysis.

2.3b Google China Virus

During the 2020-03-14 - 2020-03-21 week, Trump in an official press announcement labeled the Corona Virus as “China Virus” and we wanted to see how his comments affected search patterns across states.

<<<<<<< HEAD

This plot shows the relationship of “China Virus” search interest over grouped by region. This plots shows that there are certainly key events that trigger an uptick in searches overall. In this plot it is not clear which region may search China Virus more or less often, but it does show a that the regions move together in search interest, which would imply federal level events like a Donald Trump tweet to trigger these interest spikes.

As we can see from the density plot of the China Virus Interest during our time period 2020-03-14 - 2020-03-21, it behaves relatively normal with a small bump at 0. As a group we believe this bump occurs as 0 is the lowest value it can take and because of that limitation of the China Virus Interest we see a small bump around 0. One could argue against a normal distribution as it kinda looks a bit right skewed. But our team believes that a normal distribution is the best at describing the density.

=======

This plot shows the relationship of “China Virus” search interest over grouped by region. This plots shows that there are certainly key events that trigger an uptick in searches overall. In this plot it is not clear which region may search China Virus more or less often, but it does show a that the regions move together in search interest, which would imply federal level events like a Donald Trump tweet to trigger these interest spikes.

As we can see from the density plot of the China Virus Interest during our time period 2020-03-14 - 2020-03-21, it behaves relatively normal with a small bump at 0. As a group we believe this bump occurs as 0 is the lowest value it can take and because of that limitation of the China Virus Interest we see a small bump around 0. One could argue against a normal distribution as it kinda looks a bit right skewed. But our team believes that a normal distribution is the best at describing the density.

>>>>>>> 9892f1d4d9acb3936aac388e226678e52e5f08cb

We can see that the variability in Google interest in the term China Virus is has quite a large range between states. There are very few states that have high densities among the upper echelons of the interest scale but there are some interesting peaks of densities among the lower values. For example, we can see that Alaska, Wyoming and Iowa have unusual peaks around the 25-50 range. It is is also interesting interesting to note that there isn’t an obvious mean or median value of China Virus interest among the states.

2.3c COVID-19

<<<<<<< HEAD

This visualisation depicts the distribution of positive COVID-19 cases by region and by which political party won in the 2016 elections. We can see that Democrat states in the Midwest, Mountain, and West have a larger range and higher quantile metrics for positive cases overall. For the Northeast and South regions the mean of positive COVID-19 cases are higher but not significantly. This is an interesting pattern considering that poltical party affiliation appears to interact with the number of postive cases by region.



=======
ggplotly(c)

This visualisation depicts the distribution of positive COVID-19 cases by region and by which political party won in the 2016 elections. We can see that Democrat states in the Midwest, Mountain, and West have a larger range and higher quantile metrics for positive cases overall. For the Northeast and South regions the mean of positive COVID-19 cases are higher but not significantly. This is an interesting pattern considering that poltical party affiliation appears to interact with the number of postive cases by region.



>>>>>>> 9892f1d4d9acb3936aac388e226678e52e5f08cb

3. Methods & Models (Feddy and Will)

Model 1 Repeated Measures Model

<<<<<<< HEAD

For our simplest model we decided to use a repeated measures model. Our team decided that the repeated measures model was necessary component because of how our data is set up. As we can see in our dataset, each state has a value for their ChinaVirusInterest for each day in our target period (2020-03-14 - 2020-03-21). Given the ability to use repeated measures and our prior understanding of the varying characteristics (demographic,political,covid-impact) within different states, the repeated measures model allows us to capture these differences in ChinaVirusInterest with the \(\theta_i\) value which represents each state’s mean value.

Model Structure

\[\begin{aligned} =======

For our simplest model we decided to use a repeated measures model. Our team decided that the repeated measures model was necessary component because of how our data is set up. As we can see in our dataset, each state has a value for their ChinaVirusInterest for each day in our target period (2020-03-14 - 2020-03-21). Given the ability to use repeated measures and our prior understanding of the varying characteristics (demographic,political,covid-impact) within different states, the repeated measures model allows us to capture these differences in ChinaVirusInterest with the \(\theta_i\) value which represents each state’s mean value.

Model Structure

\[\begin{aligned} >>>>>>> 9892f1d4d9acb3936aac388e226678e52e5f08cb Y_{ij}|\theta_i, \mu, \sigma_w, \sigma_b \sim N(\theta_i,\sigma_w^2)\\ \theta_i|\mu,\sigma_b \overset{ind}{\sim} N(\mu, \sigma_b^2)\\ \sigma_b,\sigma_w \sim Exp(...) \end{aligned}\]

<<<<<<< HEAD

\(Y_{ij} = \) ChinaVirusInterest per \(i=State\), and \(j= Day\) \(\theta_i = \) State i’s unique mean value \(\sigma_w = \) within state variation \(\sigma_b = \) between state variation

3.2 Model Descriptions



=======


\(Y_{ij} =\) ChinaVirusInterest per:
\(i=State\)
, and \(j= Day\)
\(\theta_i =\) State i’s unique mean value
\(\sigma_w =\) within state variation
\(\sigma_b =\) between state variation

model_data<-
  Finaldata%>%
  mutate(Day=as.numeric(Day)-as.numeric(min(Day)))%>%
  select(ChinaVirusInterest, Day, percent_white, StateColor, positive, State)

model_data<- na.omit(model_data)
#Repeated Measures Model
set.seed(454)
RM_model <- stan_glmer(
  ChinaVirusInterest ~ (1 | State),
  data = model_data, family = gaussian,
)
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000164 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.64 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.846061 seconds (Warm-up)
## Chain 1:                0.46366 seconds (Sampling)
## Chain 1:                1.30972 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 3.7e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.37 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.7528 seconds (Warm-up)
## Chain 2:                0.6907 seconds (Sampling)
## Chain 2:                1.4435 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 4e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.7918 seconds (Warm-up)
## Chain 3:                0.447205 seconds (Sampling)
## Chain 3:                1.239 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 3e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.3 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.660988 seconds (Warm-up)
## Chain 4:                0.442185 seconds (Sampling)
## Chain 4:                1.10317 seconds (Total)
## Chain 4:

#Model2: Normal Regression

For our second model, we decided that we wanted to understand what made some states more responsive to ChinaVirusInterest than others. As you saw in our research motivation section, we wanted to explore what explained the differences in interest for the “China Virus” term. Was it a political difference, a demographic or a covid-impact related difference? To do this we used a simple Normal Regression model with the following specifications. For our demographic specification we used percent_white, for our political specification we used StateColor and for our Covid-impact specification we used positive (#of positive cases).

\[ \begin{split} Y_{ij} | b_0, b_1, \beta_0, \beta_1,\beta_2,\beta_3,\beta_4 \sigma_w, \sigma_{0b}, \sigma_{1b} & \sim N( b_{0i} + b_{1i} X_{ij}+ \beta_2X_2 + \beta_3X_3 +\beta_4X_4, \; \sigma_w^2) \\ b_{0i} | \beta_0, \sigma_{0b} & \stackrel{ind}{\sim} N(\beta_0, \sigma_{0b}^2) \\ b_{1i} | \beta_1, \sigma_{1b} & \stackrel{ind}{\sim} N(\beta_1, \sigma_{1b}^2) \\ \beta_0,\beta_1,\beta_2,\beta_3,\beta_4 & \sim N(..., ...) \\ \sigma_w & \sim Exp(...) \\ \sigma_{0b} & \sim Exp(...) \\ \sigma_{1b} & \sim Exp(...) \\ \end{split} \]

\(Y = ChinaVirusInterest\\\)
\(i = state\\\)
\(j = Days\\\)
\(X_{ij} = \text{Days}\\\)
\(X_2 = \text{percent_white}\; X_3 =\text{StateColor}\; X_4 =\text{Total Test Results}\\\)
\(\sigma_w = \text{within state variation} \\\)
\(\sigma_b = \text{between state variation}\\\)



>>>>>>> 9892f1d4d9acb3936aac388e226678e52e5f08cb

4. Model Evaluation (RESULTS)

4.1 Model 1 Evaluation (Repeated measures)

<<<<<<< HEAD

In the output above we see that the within deviation is much narrow than the between deviation. This matches our intuition in the model that utilizing the repeated measures, fixed effects model will be able to explain a greater amount of the variation.

=======
g1 <- mcmc_dens(RM_model, pars = "sigma") + 
  labs(x = expression(sigma[w])) +
  lims(x=c(5, 25))
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
g2 <- mcmc_dens(RM_model, 
  pars = "Sigma[State:(Intercept),(Intercept)]", transformations = sqrt) + 
  labs(x = expression(sigma[b])) +
   lims(x=c(5, 25))
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
grid.arrange(g1,g2,ncol=2)

In the output above we see that the within deviation is much narrow than the between deviation. This matches our intuition in the model that utilizing the repeated measures, fixed effects model will be able to explain a greater amount of the variation.

# Store the chains
rm_df <- as.array(RM_model) %>% 
  melt %>% 
  pivot_wider(names_from = parameters, values_from = value)

# Wrangle the chains
rm_df <- rm_df %>% 
  mutate(sigma_sq_w = sigma^2, sigma_sq_b = `Sigma[State:(Intercept),(Intercept)]`) %>% 
  mutate(correlation = sigma_sq_b / (sigma_sq_b + sigma_sq_w))

ggplot(rm_df, aes(x = correlation)) + 
    geom_density()

>>>>>>> 9892f1d4d9acb3936aac388e226678e52e5f08cb

This correlation table shows us that there is relatively strong correlation within each given daily observation within a state. Therefore, it is correct for us to utilize a fixed effects model in order to account for these inherent differences between states.

4.2 Model 2 Evaluation (Normal Regression)

4.3 Model 3 Evaluation(Repeated Reg + Normal)

4.4 Model 4 Evaluation (Longitudinal)



5. Results (Will)

5.1 Posterior Predctions All States

5.1.a Table

5.2 Posterior Prediction One State

5.3 Final Model



6.Conclusion

6.1 Limitations

6.2 Future Work



7. Acknowledgments and References